Rushing offense is one of the most heavily analyzed aspects of football. The role and impact of a run game in an offense has changed significantly over the past decade. While its role in the NFL is becoming more well defined, college football rushing attacks are utilized in different ways, and one that has yet to be thoroughly evaluated within a quantitative lense. To do this, we aim to quantify Quarterback (QB) and Running Back (RB) impact through Expected Points Added (EPA) to understand individual QB and RB impacts to their teams expected points, and furthermore, their teams wins. We started by gathering play-by-play rushing data and contextual information on all FBS RB’s and designed runs/scrambles from QB’s from the 2024 regular season. Using past information on RB’s and QB’s as well as their EPA through the regular season, we aim to predict how much expected points they would add in the future, accounting for the conference they play in, defenses they faced, and other situational factors such as redzone plays, opposing rushing defense latent ability, offensive pass strength and home advantage.
From our framework, we were able to estimate how many points a QB and RB add over an “average” player and a replacement player. To calculate the value of a replacement level RB, we first found the average number of RB’s to appear in a game across each conference. This number was between 4-6 across conferences, so we elected to use 4 for each conference to get a large enough sample size of replacemenet level players for each conference. We then multiplied this value across the number of teams in each conference. Using this product as a threshold, we established that all players above this threshold in terms of snap count were considered non-replacement players, and all players below the threshold were classified as replacement players. Averaging our point estimates for all replacement players in each conference, we were able to establish how many points a replacement RB would add. For QB’s, we considered the number of teams as the snap count threshold for replacement vs non replacement (i.e Big Ten has 18 teams, so the top 18 QB’s in the Big Ten by snap count were considered non-replacement, and all other QB’s were considered as replacement level players). We then translated points to wins by estimating how much points and score differential have an impact on wins. Using both points added above replacement players by conference (IPAR), as well as wins above replacement players (WAR), we can analyze how much value QB’s and RB’s add through their rushing abilities.
We began by loading the necessary libraries and pulling data from the cfbfastR and CollegeFootballData API packages. This includes rosters, team metadata, conference affiliations, and play-by-play (PBP) data for the 2024 FBS season. Here we also identified Notre Dame as Big Ten and UConn as ACC to avoid leaving both as Independent and to assist our sample size.
rosters <- load_cfb_rosters(seasons = 2024)
conferences <- cfbd_conferences()
teams <- cfbd_team_info()
pbp <- load_cfb_pbp(seasons = 2024)
teams$conference[teams$school == "Notre Dame"] <- "Big Ten"
teams$conference[teams$school == "UConn"] <- "ACC"
logos = data.frame(
conference = c("ACC","American Athletic", "Big 12","Big Ten", "Conference USA", "FBS Independents", "Mid-American", "Mountain West", "Pac-12", "SEC", "Sun Belt"),
logo = c("https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/1.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/151.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/4.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/5.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/12.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/18.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/15.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/17.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/9.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/8.png&transparent=true&w=30&h=30",
"https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa_conf/500/37.png&transparent=true&w=30&h=30")
)
We filtered the dataset to rushing plays from the regular season, ensured IDs were characters, and merged roster information to attach names and positions. For QB’s, we elected to keep sacks as passing plays, and consider rushing plays as only designed runs and scrambles. This decision was made in order to isolate a QB’s latent rushing skill and impact, rather than their ability to evade sacks, which we consider to be a unique skill that we don’t have the necessary data to estimate.
rushes <- pbp %>%
filter(rush == 1) %>%
filter(season_type == "regular")
rushes$rush_player_id <- as.character(rushes$rush_player_id)
It is important to consider conference when observing CFB metrics because not all conferences are created equal, especially within the FBS. Power 4 conferences (Big Ten, Big 12, SEC, ACC) have higher level players, and more resources to develop competitive teams than the G5 conferences (MAC, Sun Belt, PAC 12, Conference USA, American Athletic). The first graph below shows the point differential when these conferences play each other. The second visual shows that the efficiency of rushing attacks vary across FBS conference.
games = load_cfb_schedules(2024) %>%
filter(season_type == "regular",
home_division == 'fbs',
away_division == 'fbs',
completed == TRUE,
(home_points != 0 | away_points != 0),
home_conference != away_conference)
games$home_conference[games$home_team == "Notre Dame"] = "Big Ten"
games$home_conference[games$home_team == "UConn"] = "ACC"
games$home_conference[games$home_team == "Massachusetts"] = "Mid-American"
games$away_conference[games$away_team == "Notre Dame"] = "Big Ten"
games$away_conference[games$away_team == "UConn"] = "ACC"
games$away_conference[games$away_team == "Massachusetts"] = "Mid-American"
conf_games = games %>%
mutate(point_diff = home_points - away_points) %>%
mutate(
conf_pair = ifelse(home_conference < away_conference,
paste(home_conference, away_conference, sep = "_"),
paste(away_conference, home_conference, sep = "_")),
# Flip sign if we swapped the order so that point_diff always corresponds to the first conf in the pair
point_diff_aligned = ifelse(home_conference < away_conference, point_diff, -point_diff),
conf1 = ifelse(home_conference < away_conference, home_conference, away_conference),
conf2 = ifelse(home_conference < away_conference, away_conference, home_conference)
) %>%
select(conf1,conf2,point_diff_aligned)
avg_diff <- conf_games %>%
group_by(conf1, conf2) %>%
summarize(avg_point_diff = mean(point_diff_aligned), .groups = "drop")
plot2_data_a = avg_diff %>%
rename(conf_1 = conf2,
conf_2 = conf1) %>%
rename(conf1 = conf_1,
conf2 = conf_2) %>%
mutate(avg_point_diff = -avg_point_diff)
plot2_data = avg_diff %>%
full_join(plot2_data_a) %>%
left_join(logos, by=c("conf2"="conference")) %>%
mutate(avg_point_diff = -avg_point_diff)
ggplot(plot2_data, aes(x=avg_point_diff,y=conf1,image=logo))+
geom_image(size = 0.04)+
theme_bw()+
labs(
title = "Average Point Differences Between FBS Conferences",
subtitle = "When the conferences play each other",
x = "Relative Average Point Differential",
y = "Conference"
)
rbs <- rushes %>%
filter(position %in% c("RB")) %>%
group_by(pos_team, position, conference, rusher_player_name) %>%
summarize(
rush_attempts = n(),
total_rush_epa = sum(EPA, na.rm = TRUE),
epa_per_rush = mean(EPA, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(pos_team, desc(rush_attempts)) %>%
filter(!is.na(rusher_player_name))
rbsgraph <- rbs %>%
group_by(conference) %>%
summarize(
avg_epa_conf = mean(total_rush_epa, na.rm = TRUE, .groups = "drop")
) %>%
filter(conference %in% c("American Athletic", "ACC", "Big 12", "Big Ten", "Conference USA",
"Mid-American", "Mountain West", "Pac-12",
"SEC", "Sun Belt")) %>%
left_join(logos)
ggplot(rbsgraph)+
geom_image(aes(x = conference, y=avg_epa_conf, image = logo), position = position_jitter(height = .2))+
theme_bw()+
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())+
labs(
title = "Average EPA Per Conference",
x = "Conference",
y = "Average Player EPA"
)
We then added features that would help provide contextual details: Opponent conference, whether or not the play was in the redzone, whether or not the possession team was the home team, passing strength (Passing EPA/Play) of each offense, opponent defensive stuff rate, and point differential before the corresponding play. We also separated QB’s and RB’s by identifying plays where the position of the rusher was a QB vs an RB.
rushes <- rushes %>%
left_join(teams, by = c("def_pos_team" = "school"))
rushes <- rushes %>%
rename(oppConf = conference.y)
rushes <- rushes %>%
filter(conference.x %in% c("American Athletic", "ACC", "Big 12", "Big Ten", "Conference USA",
"FBS Independents", "Mid-American", "Mountain West", "Pac-12",
"SEC", "Sun Belt"))
rushes <- rushes %>%
mutate(
qbRun = ifelse(position == "QB", 1, 0)
)
rushes <- rushes %>%
mutate(
homeTeam = ifelse(pos_team == home, 1, 0)
)
pass_strength <-
pbp |>
filter(pass == 1) %>%
dplyr::group_by(pos_team) |>
dplyr::summarise(pass_strength = mean(EPA, na.rm = TRUE))
rushes <-
rushes |>
dplyr::inner_join(pass_strength, by = "pos_team")
qbRuns <- rushes %>%
filter(qbRun == 1)
rushesfilter <- rushes %>%
filter(qbRun == 0)
rushStuffRate <- rushesfilter %>%
group_by(def_pos_team) %>%
summarise(
defStuffRate = mean(stuffed_run, na.rm = TRUE)
)
qbStuffRate <- qbRuns %>%
group_by(def_pos_team) %>%
summarise(
defStuffRate = mean(stuffed_run, na.rm = TRUE)
)
To estimate player contributions to EPA, we fit linear mixed-effects models for QB and RB runs. Random effects captured player variation, while fixed effects included home field, pass strength, red zone, defensive stuff rate, score differential, and opponent conference.
qbrun_fit <-
lmer(EPA ~ 1 + (1 | rush_player_id) + homeTeam + pass_strength + rz_play +
defStuffRate + score_diff_start + oppConf,,
data = qbRuns)
summary(qbrun_fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: EPA ~ 1 + (1 | rush_player_id) + homeTeam + pass_strength + rz_play +
## defStuffRate + score_diff_start + oppConf
## Data: qbRuns
##
## REML criterion at convergence: 37294.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -7.9026 -0.5489 -0.1279 0.5118 5.9193
##
## Random effects:
## Groups Name Variance Std.Dev.
## rush_player_id (Intercept) 0.02881 0.1697
## Residual 2.00820 1.4171
## Number of obs: 10505, groups: rush_player_id, 344
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.440347 0.071746 6.138
## homeTeam 0.046294 0.028640 1.616
## pass_strength 0.477783 0.144594 3.304
## rz_play 0.165533 0.035256 4.695
## defStuffRate -1.984102 0.341980 -5.802
## score_diff_start -0.001956 0.001105 -1.771
## oppConfAmerican Athletic -0.026492 0.066147 -0.401
## oppConfBig 12 -0.039382 0.064051 -0.615
## oppConfBig Ten -0.160487 0.064318 -2.495
## oppConfConference USA -0.064072 0.072469 -0.884
## oppConfMid-American -0.095162 0.070422 -1.351
## oppConfMountain West -0.012306 0.068008 -0.181
## oppConfPac-12 0.075255 0.126126 0.597
## oppConfSEC -0.154276 0.062077 -2.485
## oppConfSun Belt -0.016484 0.064561 -0.255
nonqb_run_fit <-
lmer(EPA ~ 1 + (1 | rush_player_id) + homeTeam + pass_strength + rz_play +
defStuffRate + score_diff_start + oppConf,,
data = rushesfilter)
summary(nonqb_run_fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: EPA ~ 1 + (1 | rush_player_id) + homeTeam + pass_strength + rz_play +
## defStuffRate + score_diff_start + oppConf
## Data: rushesfilter
##
## REML criterion at convergence: 108528.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -8.5891 -0.5304 -0.1794 0.4201 8.0846
##
## Random effects:
## Groups Name Variance Std.Dev.
## rush_player_id (Intercept) 0.006457 0.08035
## Residual 1.238970 1.11309
## Number of obs: 35485, groups: rush_player_id, 667
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.1424635 0.0318874 4.468
## homeTeam 0.0335009 0.0122088 2.744
## pass_strength 0.3495600 0.0582140 6.005
## rz_play 0.1509072 0.0160926 9.377
## defStuffRate -1.4693202 0.1528631 -9.612
## score_diff_start -0.0027179 0.0004491 -6.052
## oppConfAmerican Athletic 0.0392331 0.0278622 1.408
## oppConfBig 12 0.0101312 0.0274082 0.370
## oppConfBig Ten -0.0075105 0.0260030 -0.289
## oppConfConference USA 0.0754016 0.0300634 2.508
## oppConfMid-American 0.0571322 0.0285406 2.002
## oppConfMountain West 0.0436696 0.0291426 1.498
## oppConfPac-12 0.0991635 0.0541290 1.832
## oppConfSEC -0.0365551 0.0264863 -1.380
## oppConfSun Belt 0.0854886 0.0271863 3.145
Using the ranef function to extract the random effects of each RB and QB, we extracted Individual Points Added (IPA) on the ground for each player, then multiplied by usage (number of attempts) to compute Individual Points Added above Average (IPAA). We can now visualize the top players by their IPA and IPAA. The first table output shows the top 25 QB’s and RB’s by their IPA, and the second output displays the top 10 RB’s and QB’s by their IPAA. These results pass the eye-test, as 4 of the top 5 RB’s by the IPA calculation are now in the NFL.
tmp_qbrun <- ranef(qbrun_fit)
qbrun_effects <-
data.frame(
athlete_id = rownames(tmp_qbrun[["rush_player_id"]]),
ipa_qbrun = tmp_qbrun[["rush_player_id"]][,1])
tmp_run <- ranef(nonqb_run_fit)
run_effects <-
data.frame(
athlete_id= rownames(tmp_run[["rush_player_id"]]),
ipa_run = tmp_run[["rush_player_id"]][,1])
run_effects |>
dplyr::left_join(y = rosterids |> dplyr::select(athlete_id, name), by = "athlete_id") |>
dplyr::select(name, ipa_run) |>
dplyr::arrange(dplyr::desc(ipa_run)) |>
dplyr::slice_head(n = 25)
## name ipa_run
## 1 RJ Harvey 0.17472000
## 2 Ashton Jeanty 0.12974248
## 3 Jordan James 0.12574450
## 4 Woody Marks 0.12261793
## 5 Cam Skattebo 0.12043240
## 6 Omarion Hampton 0.11959551
## 7 Kalel Mullings 0.10756744
## 8 Ja'Kobi Jackson 0.10194368
## 9 Kaleb Johnson 0.10106368
## 10 Jalen Buckley 0.10009874
## 11 Jeremiyah Love 0.09371226
## 12 Xavier Terrell 0.09218207
## 13 Bhayshul Tuten 0.09124311
## 14 Josh McCray 0.09109652
## 15 Devin Neal 0.08775226
## 16 Kelley Joiner 0.08454479
## 17 Quinten Joyner 0.08168673
## 18 Jordon Simmons 0.07870897
## 19 TreVeyon Henderson 0.07856818
## 20 Shane Porter 0.07787119
## 21 A.J. Turner 0.07409267
## 22 Duke Watson 0.07128369
## 23 Lee Beebe Jr. 0.07103331
## 24 Jarquez Hunter 0.07099577
## 25 Eli Sanders 0.06963624
qbrun_effects |>
dplyr::left_join(y = rosterids |> dplyr::select(athlete_id, name), by = "athlete_id") |>
dplyr::select(name, ipa_qbrun) |>
dplyr::arrange(dplyr::desc(ipa_qbrun)) |>
dplyr::slice_head(n = 25)
## name ipa_qbrun
## 1 Gevani McCoy 0.3077138
## 2 Jacurri Brown 0.2629746
## 3 Cam Ward 0.2580059
## 4 Tony Muskett 0.2437463
## 5 Devon Dampier 0.2050801
## 6 Joey Aguilar 0.2019899
## 7 Parker Navarro 0.1934183
## 8 Rocco Becht 0.1799118
## 9 Bryson Barnes 0.1778851
## 10 Hajj-Malik Williams 0.1752843
## 11 Hunter Herring 0.1735728
## 12 Gio Lopez 0.1662244
## 13 Hunter Watson 0.1651555
## 14 Tyler Huff 0.1633444
## 15 Jalen Milroe 0.1603738
## 16 Blake Horvath 0.1552780
## 17 Owen McCown 0.1458270
## 18 Maddux Madsen 0.1421164
## 19 Riley Leonard 0.1373716
## 20 Drew Allar 0.1263010
## 21 Hank Bachmeier 0.1260176
## 22 Jordan McCloud 0.1241058
## 23 Malachi Singleton 0.1232910
## 24 Brady Cook 0.1204425
## 25 William Watson III 0.1180794
qbrun_ipaa <-
qbRuns |>
dplyr::group_by(rush_player_id) |>
dplyr::summarise(n_qbrun = dplyr::n()) |>
dplyr::rename(athlete_id = rush_player_id) |>
dplyr::left_join(y = qbrun_effects, by = "athlete_id") |>
dplyr::mutate(ipaa_qbrun = n_qbrun * ipa_qbrun) |>
dplyr::left_join(y = rosterids |> dplyr::select(athlete_id, name, position), by = "athlete_id")
run_ipaa <-
rushesfilter |>
dplyr::group_by(rush_player_id) |>
dplyr::summarise(n_run = dplyr::n()) |>
dplyr::rename(athlete_id = rush_player_id) |>
dplyr::left_join(y = run_effects, by = "athlete_id") |>
dplyr::mutate(ipaa_run = n_run * ipa_run) |>
dplyr::left_join(y = rosterids |> dplyr::select(athlete_id, name, position), by = "athlete_id")
run_ipaa |>
dplyr::select(name, ipaa_run) |>
dplyr::arrange(dplyr::desc(ipaa_run)) |>
dplyr::slice_head(n = 10)
## # A tibble: 10 × 2
## name ipaa_run
## <chr> <dbl>
## 1 Ashton Jeanty 44.2
## 2 RJ Harvey 40.0
## 3 Omarion Hampton 33.1
## 4 Cam Skattebo 31.1
## 5 Jordan James 28.2
## 6 Woody Marks 24.5
## 7 Kaleb Johnson 24.0
## 8 Kalel Mullings 19.7
## 9 Devin Neal 19.0
## 10 Bhayshul Tuten 16.6
qbrun_ipaa |>
dplyr::select(name, ipaa_qbrun) |>
dplyr::arrange(dplyr::desc(ipaa_qbrun)) |>
dplyr::slice_head(n = 10)
## # A tibble: 10 × 2
## name ipaa_qbrun
## <chr> <dbl>
## 1 Tyler Huff 32.0
## 2 Devon Dampier 27.5
## 3 Parker Navarro 25.0
## 4 Jalen Milroe 21.8
## 5 Hajj-Malik Williams 21.6
## 6 Hunter Watson 20.8
## 7 Blake Horvath 18.8
## 8 Riley Leonard 15.4
## 9 Brendon Lewis 15.1
## 10 Rocco Becht 13.7
Our IPAA calculation gave us an indicator of how many individual points an RB and QB added over a global average intercept from a larger population of RB’s and QB’s. Thus, our next step was to identify a “replacement” level player across each conference to get a better estimate of a player’s future impact above or below this replacement level. We did this instead of finding an overall FBS threshold for replacement players RB thresholds were calculated by finding the average # of RB’s to appear for teams across each conference. This value ranged from 4-6 across conferences, but to get a large enough sample size for each conference and maintain consistency, we set that value to 4 × number of teams to get our threshold; QB thresholds equaled the number of teams in the conference. Players below thresholds were considered replacement level.We averaged the IPAA for all players falling below our conference level thresholds for QB’s and RB’s to determine our next metric, Individual Points Above Average (IPAR), which is calculated by subtracting a player’s IPAA by a replacement player (# of attempts x average IPA for replacement player) for each conference, for both QB’s and RB’s.
confrbs <- rbs %>%
group_by(conference, pos_team) %>%
summarise(
RBS = n()
) %>%
group_by(conference) %>%
summarise(
teams = n(),
avgRBS = mean(RBS, na.rm = TRUE)
) %>%
filter(conference %in% c("American Athletic", "ACC", "Big 12", "Big Ten", "Conference USA",
"FBS Independents", "Mid-American", "Mountain West", "Pac-12",
"SEC", "Sun Belt")) %>%
mutate(
slicenum = teams * 4
)
#RB IPAR calculation
run_ipaa <- run_ipaa%>%
left_join(teaminfo, by = "athlete_id")
run_ipaa <- run_ipaa %>%
left_join(confrbs, by =c("conference" = "conference")) %>%
filter(!is.na(conference))
run_ipaa <- run_ipaa %>%
filter(position.x == "RB")
run_ipaa <- run_ipaa %>%
group_by(conference) %>%
arrange(desc(n_run), .by_group = TRUE) %>%
mutate(
threshold = nth(n_run, first(slicenum)), # get the slicenum for this conf
repl_rb = if_else(n_run <= threshold, 1, 0)
) %>%
ungroup()
repl_rb_ipa_run <- run_ipaa %>%
group_by(conference) %>%
filter(repl_rb == 1) %>%
summarise(
avg_repl_ipa_run = mean(ipa_run, na.rm = TRUE),
.groups = "drop"
)
run_ipaa <- run_ipaa %>%
left_join(repl_rb_ipa_run, by = "conference")
run_ipar <-
run_ipaa |>
dplyr::mutate(
shadow_run = dplyr::case_when(
position.x == "RB" ~ n_run * avg_repl_ipa_run),
ipar_run = ipaa_run - shadow_run)
#QB IPAR Calculation
qbrun_ipaa <- qbrun_ipaa%>%
left_join(teaminfo, by = "athlete_id")
qbrun_ipaa <- qbrun_ipaa %>%
left_join(confrbs, by =c("conference" = "conference")) %>%
filter(!is.na(conference)) %>%
mutate(
slicenum = teams
)
qbrun_ipaa <- qbrun_ipaa %>%
group_by(conference) %>%
arrange(desc(n_qbrun), .by_group = TRUE) %>%
mutate(
threshold = nth(n_qbrun, first(slicenum)), # get the slicenum for this conf
repl_qb = if_else(n_qbrun <= threshold, 1, 0)
) %>%
ungroup()
repl_qb_ipa_run <-
qbrun_ipaa |>
group_by(conference) %>%
dplyr::filter(repl_qb == 1) |>
summarise(
avg_repl_ipa_qbrun = mean(ipa_qbrun, na.rm = TRUE),
.groups = "drop"
)
qbrun_ipaa <- qbrun_ipaa %>%
left_join(repl_qb_ipa_run, by = "conference")
qbrun_ipar <-
qbrun_ipaa |>
dplyr::filter(position.x %in% c("QB")) |>
dplyr::mutate(
shadow_qbrun = dplyr::case_when(
position.x == "QB" ~ n_qbrun * avg_repl_ipa_qbrun),
qb_ipar = ipaa_qbrun - shadow_qbrun)
Now that we had a determined how many points RB’s and QB’s can add on the ground above a replacement player, our next goal was to determine how to connect this to the wins these players added over a replacement player. We calculated the point differential across all FBS games in 2024, and regressed a model on wins for FBS teams. This provided us with a beta coefficient to convert our IPAR from the point scale to a wins scale. Upon fitting our model, we multiplied the coefficient to IPAR generated by each RB and QB, giving us our final Wins Above Replacement (WAR) metric.
games = load_cfb_schedules(2024) %>%
filter(season_type == "regular",
home_division == 'fbs',
away_division == 'fbs',
completed == TRUE,
(home_points != 0 | away_points != 0))
conf = games %>%
select(season,home_team,home_conference) %>%
rename(team = home_team, conf = home_conference) %>%
distinct(season,team,conf)
games = games %>%
mutate(result = home_points-away_points,
win_t = ifelse(result > 0, home_team, away_team),
lose_t = ifelse(result < 0, home_team, away_team),
win_by = ifelse(win_t == home_team, result, -1*result),
lose_by = ifelse(lose_t == home_team, result, -1*result) ) %>%
select(season,win_t,win_by,lose_t,lose_by,result)
win_diff =
games %>%
dplyr::group_by(season, win_t) %>%
dplyr::summarise(wins = dplyr::n(), win_diff = sum(win_by), .groups = 'drop') %>%
dplyr::rename(team = win_t)
loss_diff =
games %>%
dplyr::group_by(season, lose_t) %>%
dplyr::summarise(loss = dplyr::n(), loss_diff = sum(lose_by), .groups = 'drop') %>%
dplyr::rename(team = lose_t)
records =
win_diff %>%
dplyr::full_join(y = loss_diff, by = c("season", "team")) %>%
dplyr::mutate(across(everything(), ~replace_na(.x,0)),
scoring_diff = win_diff + loss_diff) %>%
left_join(conf, by = c("season","team"))
win_score_fit = lm(wins~scoring_diff, data = records)
points_to_win = coefficients(win_score_fit)[2] %>% unname()
rbwars <-
unique(c(run_ipar$athlete_id))
qbwars <-
unique(c(qbrun_ipar$athlete_id))
skill_war <-
data.frame(athlete_id = rbwars) |>
dplyr::left_join(y = rosterids |> dplyr::select(athlete_id, name, position, team), by = "athlete_id") |>
dplyr::left_join(y = run_ipar |> dplyr::select(athlete_id, n_run, ipar_run), by = "athlete_id") |>
tidyr::replace_na(list(ipar_run=0)) |>
dplyr::mutate(
war_run = ipar_run * points_to_win)
qbwar <-
data.frame(athlete_id = qbwars) |>
dplyr::left_join(y = rosterids |> dplyr::select(athlete_id, name, position, team), by = "athlete_id") |>
dplyr::left_join(y = qbrun_ipar |> dplyr::select(athlete_id, n_qbrun, qb_ipar), by = "athlete_id") |>
tidyr::replace_na(list(qb_ipar=0)) |>
dplyr::mutate(
war_qbrun = qb_ipar * points_to_win)
skill_war %>%
arrange(desc(war_run)) %>%
slice_head(n=10)
## athlete_id name position team n_run ipar_run war_run
## 1 4890973 Ashton Jeanty RB Boise State 341 42.53183 0.8478808
## 2 4568490 RJ Harvey RB UCF 229 41.19189 0.8211688
## 3 4685382 Omarion Hampton RB North Carolina 277 34.53606 0.6884836
## 4 4696981 Cam Skattebo RB Arizona State 258 32.40213 0.6459431
## 5 4685397 Jordan James RB Oregon 224 28.39401 0.5660406
## 6 4429059 Woody Marks RB USC 200 24.72648 0.4929276
## 7 4819231 Kaleb Johnson RB Iowa 237 24.19253 0.4822831
## 8 4682652 Devin Neal RB Kansas 217 20.16136 0.4019209
## 9 4429121 Kalel Mullings RB Michigan 183 19.87049 0.3961224
## 10 4882093 Bhayshul Tuten RB Virginia Tech 182 17.53143 0.3494927
qbwar %>%
arrange(desc(war_qbrun)) %>%
slice_head(n=10)
## athlete_id name position team n_qbrun qb_ipar
## 1 4574356 Tyler Huff QB Jacksonville State 196 38.36556
## 2 5105849 Devon Dampier QB New Mexico 134 36.66281
## 3 4571364 Hajj-Malik Williams QB UNLV 123 29.98829
## 4 4610131 Parker Navarro QB Ohio 129 27.91197
## 5 5220424 Hunter Watson QB Sam Houston 126 24.89178
## 6 4429149 Brendon Lewis QB Nevada 132 24.10808
## 7 4432734 Jalen Milroe QB Alabama 136 21.42414
## 8 5081504 Blake Horvath QB Navy 121 20.48109
## 9 4683423 Riley Leonard QB Notre Dame 112 16.45659
## 10 4801299 Rocco Becht QB Iowa State 76 15.31451
## war_qbrun
## 1 0.7648254
## 2 0.7308808
## 3 0.5978229
## 4 0.5564310
## 5 0.4962227
## 6 0.4805995
## 7 0.4270948
## 8 0.4082948
## 9 0.3280656
## 10 0.3052980
The final tables show the RBs and QBs who contributed most to team wins through rushing in 2024.
rbwarout <- skill_war %>%
left_join(rosters, by = "athlete_id")
rbwarinfo <- rbwarout %>%
left_join(teams, by = c("team.x" = "school"))
rbwar <- rbwarout %>%
dplyr::select(
name,
headshot_url,
team.x,
n_run,
ipar_run,
war_run
) %>%
mutate(
ipar_run = round(ipar_run, 2),
war_run = round(war_run, 2)
) %>%
arrange(desc(war_run)) %>%
slice_head(n=10) %>%
gt() %>%
cols_align(align = "center") %>%
cols_label(
name = "Name",
headshot_url = "",
team.x = "Team",
n_run = "Rush Attempts",
ipar_run = "IPAR",
war_run = "Rush War"
) %>%
tab_header(
title = "CFB RB WAR",
subtitle = "Data and Output From 2024 Regular Season"
) %>%
gtExtras::gt_theme_538()
rbwar <- gt_hulk_col_numeric(rbwar, column = "war_run")
rbwar <- gt_img_rows(rbwar, column = "headshot_url")
rbwar
| CFB RB WAR | |||||
| Data and Output From 2024 Regular Season | |||||
| Name | Team | Rush Attempts | IPAR | Rush War | |
|---|---|---|---|---|---|
| Ashton Jeanty | Boise State | 341 | 42.53 | 0.85 | |
| RJ Harvey | UCF | 229 | 41.19 | 0.82 | |
| Omarion Hampton | North Carolina | 277 | 34.54 | 0.69 | |
| Cam Skattebo | Arizona State | 258 | 32.40 | 0.65 | |
| Jordan James | Oregon | 224 | 28.39 | 0.57 | |
| Woody Marks | USC | 200 | 24.73 | 0.49 | |
| Kaleb Johnson | Iowa | 237 | 24.19 | 0.48 | |
| Devin Neal | Kansas | 217 | 20.16 | 0.40 | |
| Kalel Mullings | Michigan | 183 | 19.87 | 0.40 | |
| Bhayshul Tuten | Virginia Tech | 182 | 17.53 | 0.35 | |
qbwarout <- qbwar %>%
left_join(rosters, by = "athlete_id")
qbwartbl <- qbwarout %>%
dplyr::select(
name,
headshot_url,
team.x,
n_qbrun,
qb_ipar,
war_qbrun
) %>%
mutate(
qb_ipar = round(qb_ipar, 2),
war_qbrun = round(war_qbrun, 2)
) %>%
arrange(desc(war_qbrun)) %>%
slice_head(n=10) %>%
gt() %>%
cols_align(align = "center") %>%
cols_label(
name = "Name",
headshot_url = "",
team.x = "Team",
n_qbrun = "Rush Attempts",
qb_ipar = "IPAR",
war_qbrun = "Rush War"
) %>%
tab_header(
title = "CFB QB WAR on Designed Runs and Scrambles",
subtitle = "Data and Output From 2024 Regular Season"
) %>%
gtExtras::gt_theme_538()
qbwartbl <- gt_hulk_col_numeric(qbwartbl, column = "war_qbrun")
qbwartbl <- gt_img_rows(qbwartbl, column = "headshot_url")
qbwartbl
| CFB QB WAR on Designed Runs and Scrambles | |||||
| Data and Output From 2024 Regular Season | |||||
| Name | Team | Rush Attempts | IPAR | Rush War | |
|---|---|---|---|---|---|
| Tyler Huff | Jacksonville State | 196 | 38.37 | 0.76 | |
| Devon Dampier | New Mexico | 134 | 36.66 | 0.73 | |
| Hajj-Malik Williams | UNLV | 123 | 29.99 | 0.60 | |
| Parker Navarro | Ohio | 129 | 27.91 | 0.56 | |
| Hunter Watson | Sam Houston | 126 | 24.89 | 0.50 | |
| Brendon Lewis | Nevada | 132 | 24.11 | 0.48 | |
| Jalen Milroe | Alabama | 136 | 21.42 | 0.43 | |
| Blake Horvath | Navy | 121 | 20.48 | 0.41 | |
| Riley Leonard | Notre Dame | 112 | 16.46 | 0.33 | |
| Rocco Becht | Iowa State | 76 | 15.31 | 0.31 | |
There are a few limitations of this model that need to be noted. This WAR metric only accounts for rushing, not passing or receiving due to lack of play-by-play data provided at the college level. Since there are only 2 independent teams, Notre Dame was moved to Big Ten and UConn was moved to ACC. Replacement thresholds are heuristics; applying different definitions of replacement could change the results.The linear mixed-effects model does not capture interactive variables such as QB mobility vs defensive schemes. Only the regular season statistics were used during this development.
In the future, we are interested in implementing blocking/offensive line performance into our model to account for how offensive line play affects rushing production. With more data in the play-by-play dataset (run direction, shotgun, personnel), our results would account for more of the play-calling and overall offensive structural elements of college football run games. Another key piece of the offense that we should include are receiving and passing plays for full evaluation of the offense, however we do not have enough data to accurately observe these variables right now. A direction we could go in this point of time would be to compare year-over-year changes rather than just looking at one season of rushing data. Furthermore, this analysis can be used to project rusher’s across different conferences, which could aid with transfer portal evaluation.